feat(power): measured-power multinode support (workers[] + per-stage joules) by arygupt · Pull Request #405 · SemiAnalysisAI/InferenceX-app

arygupt · 2026-05-28T22:33:08Z

What

App-side support for the InferenceX runner's multinode measured-power telemetry — the dashboard counterpart to SemiAnalysisAI/InferenceX#1574.

Surfaces, end-to-end (ingest → DB → transform → types):

Per-worker power — workers[], each labeled prefill / decode with num_gpus, hosts[], and per-worker avg_power_w (+ optional temp/util/mem).
Per-stage joules (disagg) — joules_per_input_token = prefill energy ÷ input; joules_per_output_token = decode energy ÷ output (per-stage, decode GPUs only). joules_per_total_token stays cluster-wide.
Cluster-wide telemetry — avg_temp_c, peak_temp_c, avg_util_pct, avg_mem_used_mb.

Changes

db: migration 006_benchmark_results_workers.sql; ETL mapper + ingest + queries + json-provider carry workers[] and the new metric scalars.
constants: metric-keys lists the measured-power fields.
app: benchmark-transform + inference types surface the fields; unofficial-run route passes them through; cypress measured-power-overlay e2e.

Schema note (pairs with the runner change)

joules_per_output_token is now the per-stage decode value on disagg (symmetric with joules_per_input_token), matching the reviewer's spec on #1574. The old joules_per_output_token_decode key is removed (folded into joules_per_output_token) — it was only plumbed through, never rendered; the J/output chart already reads joules_per_output_token.

Validation

app 60 + db 223 unit tests pass
tsc --noEmit clean; lint (oxlint) 0 warnings/errors

Depends on

Runner PR SemiAnalysisAI/InferenceX#1574 (emits these agg-JSON fields). Land/coordinate together.

🤖 Generated with Claude Code

Note

Medium Risk
Adds a schema migration and materialized-view rebuild plus broad ETL/API contract changes; behavior is well-tested but must stay aligned with the companion runner PR and production migration order.

Overview
Adds end-to-end support for multinode / disaggregated measured-power telemetry from the runner: per-worker breakdowns, disagg role-split scalars, and cluster-wide temp/util/memory metrics.

Database & ingest: Migration 006 adds a dedicated benchmark_results.workers JSONB column (kept out of flat metrics), recreates latest_benchmarks, and extends bulk ingest upserts. ETL gains extractWorkers() with validation, excludes workers from numeric metrics, and registers new scalar keys in METRIC_KEYS.

Read path: Benchmark queries, json-provider dumps, API BenchmarkRow, unofficial-run normalization, and benchmark-transform all pass through workers[] plus scalars like prefill_avg_power_w, decode_avg_power_w, joules_per_input_token, and avg_temp_c—preserving undefined vs zero for legacy rows.

App types & tests: WorkerPower / extended AggDataEntry types document disagg semantics (joules_per_output_token as per-stage decode on disagg). Unit tests cover mapper, transform, and unofficial-run routes; Cypress verifies measured-energy Y-axis options on an unofficial-run overlay.

^{Reviewed by Cursor Bugbot for commit db9b15d. Bugbot is set up for automated code reviews on this repo. Configure here.}

…orm layer The multinode runner emits per-role power scalars (prefill_avg_power_w, decode_avg_power_w, joules_per_input_token, joules_per_output_token_decode) and a per-worker breakdown array on disagg runs. Surface them on BenchmarkRow -> AggDataEntry -> InferenceData so chart features can consume them without further transform changes. Workers ride as a sibling of the scalar metrics dict to keep the existing Record<string, number> index intact.

App-side support for the InferenceX runner's multinode measured-power telemetry: per-worker power breakdown (workers[] with role / num_gpus / hosts), per-stage joules (input = prefill energy, output = decode energy), and cluster-wide temp / util / mem. - db: migration 006_benchmark_results_workers.sql; ETL mapper + ingest + queries + json-provider carry workers[] and the new metric scalars - constants: metric-keys lists the measured-power fields; joules_per_output_token is per-stage decode on disagg (no separate _decode key) - app: benchmark-transform + inference types surface the fields; unofficial-run route passes them through; cypress measured-power-overlay e2e - drop joules_per_output_token_decode (folded into joules_per_output_token) to match the runner's per-stage change Validated: app 60 + db 223 unit tests pass; tsc --noEmit clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-05-28T22:33:14Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 28, 2026 11:27pm

claude · 2026-05-28T22:33:23Z

Claude finished @arygupt's task in 2m 58s —— View job

Claude Code Review

Read changed files
Analyze multinode transform / type changes
Check tests cover the new transform logic
Post review feedback

🟢 LGTM — no blocking issues found.

Reviewed the multinode measured-power plumbing end-to-end. The diff is clean and defensive at every JSON boundary:

extractWorkers (packages/db/src/etl/benchmark-mapper.ts:263) correctly distinguishes "missing field" from "field == 0" via parseNum/parseInt2 returning undefined only for unparseable values — the frontend worker with num_gpus: 0, avg_power_w: 0 case is exercised by the tests.
rowToAggDataEntry (packages/app/src/lib/benchmark-transform.ts:74) narrows row.workers with Array.isArray so a null JSONB column or a malformed payload survives without poisoning downstream consumers (verified by the 'oops' as any test).
bulkIngestBenchmarkRows encodes missing workers as JSON null so the heterogeneous jsonb[] unnest stores SQL NULL — correct for the new nullable column.
Migration 006_benchmark_results_workers.sql correctly drops + recreates the latest_benchmarks materialized view so br.* picks up the new column, and re-creates both indexes (incl. the (config_id, conc, isl, osl) unique index needed for REFRESH … CONCURRENTLY).
The unofficial-run path (route.ts:60) passes workers through symmetrically, and cypress/e2e/measured-power-overlay.cy.ts exercises the overlay code path against a real GHA run — satisfying CLAUDE.md's overlay-support requirement.

Test coverage is solid: dedicated unit tests for extractWorkers edge cases (non-array, empty, missing required fields, string-numeric coercion, malformed hosts), workers propagation through mapBenchmarkRow/rowToAggDataEntry/normalizeArtifactRows, plus the new role-split scalars and cluster-wide temp/util/mem fields.

Minor observation (non-blocking): the WorkerPower shape is now defined three times — packages/db/src/etl/benchmark-mapper.ts:80, packages/db/src/queries/benchmarks.ts:8 (as BenchmarkWorkerRow), and packages/app/src/components/inference/types.ts:35. They're structurally identical so TypeScript accepts cross-boundary assignment, but a future schema change will need to be made in three places. Worth consolidating into @semianalysisai/inferencex-constants at some point.
· feat/measured-power-multinode-app

…serting visibility The measured-power-overlay e2e asserted `.should('be.visible')` on the gated "Measured Energy" group + its options right after opening the Y-axis dropdown. That list is a scroll container (searchable-select: max-h-72 overflow-y-auto) and the group sits below the fold, so the items are in the DOM but clipped by the overflow ancestor. Cypress `be.visible` does not auto-scroll, so it failed deterministically on both the chrome and firefox e2e shard-2 jobs. The sibling test passed only because `.click()` auto-scrolls its target into view. Add `.scrollIntoView()` before each visibility assertion. Verified locally: measured-power-overlay.cy.ts 2/2 passing against the real run 26312107787 artifact with E2E_FIXTURES=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

claude · 2026-05-28T23:09:37Z

Claude finished @arygupt's task in 50s —— View job

Claude Code Review (Re-review)

Identify what changed since last review
Check the new commit
Verify test fix correctness
Post re-review summary

🟢 Re-review: LGTM — no blocking issues found.

Only one new commit since the previous review (9847aef): a focused Cypress fix that adds .scrollIntoView() before each should('be.visible') assertion on the gated "Measured Energy" dropdown group in packages/app/cypress/e2e/measured-power-overlay.cy.ts.

Root-cause analysis in the commit message matches the code:

The searchable-select list is a max-h-72 overflow-y-auto scroll container, so the new "Measured Energy" group renders below the fold.
Cypress be.visible does not auto-scroll, which is why the assertion failed deterministically on both chrome and firefox e2e shard-2 jobs reported by Blacksmith.
The second test passed because .click() does auto-scroll its target — consistent with the diagnosis.
The fix is the smallest possible: .scrollIntoView() chained before each .should('be.visible'). No production code touched.

Previous review feedback on the multinode telemetry plumbing (extractWorkers parsing, JSONB null handling, materialized-view recreation, overlay path symmetry) all still applies and remains addressed. The non-blocking observation about consolidating the WorkerPower shape into @semianalysisai/inferencex-constants is unchanged — still optional, not introduced by this commit.
· feat/measured-power-multinode-app

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 9847aef. Configure here.}

…age dup The read-side BenchmarkWorkerRow interface in queries/benchmarks.ts was a verbatim redeclaration of the ingest-side WorkerPower in etl/benchmark-mapper.ts. Alias it to the canonical definition so the per-worker shape can't drift within the package, keeping the BenchmarkWorkerRow name its consumers reference. Addresses the Cursor Bugbot "triplicate WorkerPower interface" finding on #405. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

claude · 2026-05-28T23:26:48Z

Claude finished @arygupt's task in 52s —— View job

Claude Code Review (Re-review)

Identify what changed since last review
Verify refactor correctness
Post re-review summary

🟢 Re-review: LGTM — no blocking issues found.

One new commit since the previous review (db9b15d): a focused refactor that aliases BenchmarkWorkerRow to WorkerPower in packages/db/src/queries/benchmarks.ts:11, addressing the Cursor Bugbot "triplicate WorkerPower interface" finding (and the same observation in my prior review).

Verified:

The two in-package shapes were field-for-field identical (role, worker_idx, hosts?, num_gpus, avg_power_w, avg_temp_c?, peak_temp_c?, avg_util_pct?, avg_mem_used_mb?) — type BenchmarkWorkerRow = WorkerPower is a safe drop-in.
The public name BenchmarkWorkerRow is preserved, so external consumers (json-provider, BenchmarkRow.workers) are unaffected.
Scope is correctly limited to the in-db-package duplication. The third copy in packages/app/src/components/inference/types.ts stays — consolidating that would require moving the shape into @semianalysisai/inferencex-constants, which is a larger cross-package change appropriate for a follow-up.

All prior review findings on the multinode telemetry plumbing (extractWorkers parsing, JSONB null handling, materialized-view recreation, overlay path symmetry, test coverage) remain addressed and still apply.
· feat/measured-power-multinode-app

arygupt and others added 2 commits May 28, 2026 10:46

arygupt requested a review from adibarra as a code owner May 28, 2026 22:33

This comment has been minimized.

Sign in to view

vercel Bot deployed to Preview May 28, 2026 23:10 View deployment

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread packages/db/src/etl/benchmark-mapper.ts

vercel Bot deployed to Preview May 28, 2026 23:27 View deployment

arygupt merged commit 183f9b1 into master May 28, 2026
18 checks passed

arygupt deleted the feat/measured-power-multinode-app branch May 28, 2026 23:46

arygupt mentioned this pull request May 29, 2026

fix(db): tolerate missing workers column in benchmark read queries #407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(power): measured-power multinode support (workers[] + per-stage joules)#405

feat(power): measured-power multinode support (workers[] + per-stage joules)#405
arygupt merged 4 commits into
masterfrom
feat/measured-power-multinode-app

arygupt commented May 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented May 28, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 28, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

claude Bot commented May 28, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

claude Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arygupt commented May 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Schema note (pairs with the runner change)

Validation

Depends on

Uh oh!

vercel Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Uh oh!

This comment has been minimized.

claude Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review (Re-review)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review (Re-review)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arygupt commented May 28, 2026 •

edited by cursor Bot

Loading

vercel Bot commented May 28, 2026 •

edited

Loading

claude Bot commented May 28, 2026 •

edited

Loading

claude Bot commented May 28, 2026 •

edited

Loading

claude Bot commented May 28, 2026 •

edited

Loading